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Conceptual tests are widely used by physics instructors to assess students’ conceptual under¬ 
standing and compare teaching methods. It is common to look at students’ changes in their answers 
between a pre-test and a post-test to quantify a transition in student’s conceptions. This is often 
done by looking at the proportion of incorrect answers in the pre-test that changes to correct answers 
in the post-test - the gain - and the proportion of correct answers that changes to incorrect answers 
- the loss. By comparing theoretical predictions to experimental data on the Force Concept Inven¬ 
tory, we shown that Item Response Theory (IRT) is able to fairly well predict the observed gains 
and losses. We then use IRT to quantify the student’s changes in a test-retest situation when no 
learning occurs and show that i) up to 25% of total answers can change due to the non-deterministic 
nature of student’s answer and that ii) gains and losses can go from 0% to 100%. Still using IRT, 
we highlight the conditions that must satisfy a test in order to minimize gains and losses when no 
learning occurs. Finally, recommandations on the interpretation of such pre/post-test progression 
with respect to the initial level of students are proposed. 


I. INTRODUCTION 

Conceptual tests are widely used by physics instructor 
to asses students’ conceptual understanding and compare 
teaching methods. In particular, the Force Concept In¬ 
ventory P (FCI) evaluate student’s mastering of Newton 
laws [2] . It consists of 30 multiple-choice questions where 
incorrect answers are based on the most frequently an¬ 
swers given by students in interviews. Many topics are 
covered by the FCI : kinematics, identification of forces 
and the three Newton’s laws pp. Instructors usually 
use the raw score or the Hake gain [2] to evaluate global 
student’s progression. Item Response Theory (IRT) pro¬ 
vide a more theoretically grounded measure of student’s 
progression Pi]. Over the past decade, IRT have been 
applied with success to concept inventories, in particu¬ 
lar to the FCI [THTT]. Student’s raw score or student’s 
proficiency given by IRT provide a global measure of the 
acquisition of the Newtonian concepts. 

A closer look to student’s answer in a test-retest situ¬ 
ation has shown that while the total score to the test is 
highly reliable, 31% of the student’s answers change from 
test to retest, suggesting weak reliability for individual 
answers na. Looking how answers of students change 
between a pre-test - before instruction - and a post-test 
- after instruction - using a database embedding more 
than 13 000 students’ answers, Lasry et al. m revealed a 
strong positive correlation between the initial score and 
the proportion of incorrect answers on the pre-test that 
were changed to correct answers on the post-test - the 
gains. A symmetric result was found for the losses - the 
proportion of correct answers on the pre-test that were 
changed to incorrect answers on the post-test, strongly 
and negatively correlated to the initial score. This result 
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suggests that students with higher prior level learn more 
and forget less than students with lower prior level. 

In this article we show that IRT can be used to qual¬ 
itatively predict those experimental data while offering 
another interpretation of the previous results. The ob¬ 
served correlation mainly comes from inherent properties 
of the test rather than reflecting the level of progression of 
students. We show in particular that the student’s profi¬ 
ciency progression, as obtained by IRT, increases for low 
proficiency students, a conclusion at the opposite of the 
previous interpretation. 

The article is organized as follow : section [HI provides 
definition of gains and losses; section in introduces IRT 
theory and the underlying assumptions; section [IV| com¬ 
pares theory’s predictions with experimental data; sec¬ 
tion [V| exploit IRT to predict answer’s changes; finally 
section El and IVIII discuss and conclude this work. 


II. GAINS AND LOSSES 

Consider the situation of students taking a same test 
two times : the first one before instruction and the sec¬ 
ond one after instruction. It is hoped that the score of 
each student increases, so that a part of answers which 
were initially wrong becomes correct. Following Lasry 
et al. m, we define the gain G as the proportion of in¬ 
correct answers on the pre-test that change to correct 
answers on the post-test. Similarly, the loss L is de¬ 
fined as the proportion of correct answers on the pre-test 
that change to incorrect answers on the post-test. We 
then introduce ICi as the proportion of students who 
change from an incorrect (/) to a correct (C) answer 
at the question i and Ii as the proportion of initial in¬ 
correct answers, gains and losses are then defined by 
G = IGi/Ii and L = GlijGi, where ( . ) denotes the 
average over the questions of the test. Gi is the propor¬ 
tion of initial correct answers to question % so that Gi is 
the average pre-test score of the students. Using data 
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FIG. 1. Gain (blue) and loss (yellow) as a function of pre¬ 
test score at the FGL Points are measurements from a large 
pool of students m and lines are theoretical predictions using 
questions parameters of the IRT analysis obtained in [9] . 
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FIG. 2. Item characteristic curves for questions 1 (dashed 
line) and 13 (continuous line) of the FGL Questions parame¬ 
ters are taken from [9]. 


from more than 13,000 students’ answers on the Force 
Concept Inventory (FCI), Lasry et al. [13] measured de- 
pendance of gains and losses with prior knowledge (pre¬ 
test score). As shown in Fig. students with higher 
prior knowledge have higher gain and smaller loss than 
students with lower prior knowledge. In order to inter¬ 
pret these results, it is first necessary to draw the same 
graph when no learning occurs. That is to say when the 
same test is taken two times consecutively, with student 
not memorizing their previous answers and not having 
learned anything between the two tests. We show in the 
next sections how IRT is able to answer this question. 


III. THE ITEM RESPONSE THEORY 

Item Response Theory (IRT) belongs to the family of 
latent trait modeling M- In those models, each student 
is described by a number of latent traits, also call profi¬ 
ciencies. The answer of a student to a question is thought 
of as the result of the interaction between the capabili¬ 
ties of the person taking the test and the characteristics 
of the test items. The score of a student to an item is 
modeled by a probabilistic function of his proficiencies 
and the item’s characteristics. A consequent number of 
knowledge and skills are always necessary to give a cor¬ 
rect answer m but in many cases, only one proficiency 
is sufficient to determine the student score. This is call 
unidimensional Item Response Theory but is often sim¬ 
ply called IRT. This assumption was shown to be valid 
to model student’s answer to the FCI [ais] and will be 
assumed in the following. 

Let’s note 0 the proficiency of a student. Each question 
i is modeled by a function Pi (0) which describes the prob¬ 
ability of a student with proficiency 0 to correctly answer 
to the question i. Pi functions, called item characteristic 
curves, are often assumed to be generic ”S-shape” func¬ 
tions (see Fig. [^, called logistic function, whose varia¬ 


tions characterize each questions. In the three-parameter 
item model, Pi{0) is given by 


Pi{0)=CiP 


1 Ci 

1 -b exp [-1.7 ai{0 - bi)] 


( 1 ) 


where a^, bi and q are parameters of the question : 
is its discrimination power, bi its difficulty and q the 
probability of guessing. The parameters are estimated 
by statistical techniques using a large pool of students 
answers. Other models exist such as the two-parameter 
model {ci = 0), the Rasch model (q = 0 and = 1) and 
the non-parametric kernel smoothing approach [16]. All 
these models have been applied to the FCI [ZHn]. For 
instance. Pi functions for question 1 and 13 of the FCI 
are plotted in Fig. Question 13 is more difficult than 
question 1 (613 > 61 ) so its curve is more ”on the right” 
of the graph. Its discrimination is also larger (ais > 
ai) so that the S-shape is steeper. Finally, the guessing 
parameter is lower (cis < ci), as seen on the value of Pi 
when 0 goes to — oc. 

The true score (in %) of a group of students with pro¬ 
ficiency 0 is given by S{0) = Pi{0). Because of the prob¬ 
abilistic nature of IRT, the score S{0) for a given pro¬ 
ficiency 0 differs from the observed score of a student 
with that proficiency ^ - the number of correct answer 
given by the student divided by the number of questions. 
The true score S{0) is only recovered as an average over 
a large number of equal-proficiency student’s individual 
observed scores. The observed score is also named the 
raw score and one strength of IRT is to convert this raw 
score, which is a discrete bounded variable, into a con¬ 
tinuous unbounded variable, which is assumed to be 
an interval scale - i.e. a scale which can be used to quan¬ 
tify a progression or a difference of proficiency between 
students [4]. 
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IV. IRT PREDICTION OF GAINS AND LOSSES 


The objective of a course is to increase student’s profi¬ 
ciency. Let’s write Opre the proficiency of a student before 
instruction and Opost its proficiency after instruction. By 
definition, the probability of choosing the correct answer 
to the question i during the pre-test is Pi {Opre)- For the 
same reason, this probability is Pi (Opost) for the post-test. 
For a wide group of student with the same proficiencies, 
we get li = 1 Pi{0pre) S'lld ICi = (1 Pi{0pre)) Pi{0post) • 
Reporting these equations into the definition of the gain 
and the loss leads to 


G- — ^post 


L — (1 Ppost) 


SPi (Opre ) ^Pi{0post) 


( 3 ) 


where 6Pi{0) is the difference between probability of suc¬ 
cess of question i and average test score S for a given 
proficiency : 


6Pi{0) = Pi{0) - Pi{0). (4) 


By definition 6Pi{0) = 0. In the particular case when 
Opre = Opost (he. when no instruction occurs), 6Pi 6Pi is 
the variance of the P^’s for a given 0 and is a characteristic 
of the test. 

Equations Q and ^ show that IRT enables us to 
predict measured values for G and L once Opre , Opost and 
all the P^’s are known. However, data of Lasry et al. [13] 
give values of G and L as functions of Spre so informations 
about Opre^ Opost and all the P^’s function are missing. 

First Pi functions are taken form literature. Using the 
three-parameter model, Wang and Bao [9] performed an 
IRT analysis of the FCI using their own database of 2 800 
student’s answers, leading to the knowledge of the 30 Pi 
functions. The measurements obtained by Wang and Bao 
with their students can be used for any students because 
characteristics of questions are independent of the popu¬ 
lation used to obtained them. This property is known as 
parameter invariance HZ]. Hence there Pi functions are 
used here. 

Secondly, foreach values of Spre we estimated Spost 
from data of Lasry et al. m using 

Spost = Spre (1 — p) + (1 — Spre) G , (5) 


Figure shows that eqs. (§ and © match fairly well 
the experimental measurements, indicating that IRT is 
able to correctly predict gains and losses. Discrepan¬ 
cies can be attributed to both uncertainties of measure¬ 
ments of Pi and to an unperfect parameter invariance. 
Such a case can occur in particular when the hypoth¬ 
esis of unidimensionality does not hold. As shown by 
Scott and Schumayer [3], while a unique proficiency can 
be used to describe student’s characteristic, a 5 dimen¬ 
sional model seems preferable. Our results show that 
a one-dimensional model is able to give the global ten¬ 
dency for the gain and the loss. A more detailed analysis 
is reported for future work. 

As seen in Fig. gain is an increasing function of stu¬ 
dent’s initial score. A tempting interpretation is to say 
that students with higher initial knowledge learn more 
than students with lower initial knowledge. The reverse is 
also true for loss : students with higher initial knowledge 
have lower loss than students with lower initial knowl¬ 
edge. However this argument implicitly assumes that 
gains and losses are zero when no learning occurs. We 
now show that this is not the case, which at least makes 
the previous conclusion unsecured. To do so, we use IRT 
to estimate G and L when Opost = Opre^ using equa¬ 
tions © and Results are plotted in Fig. which 
clearly show that even when no learning occurs gain is 
an increasing function of the pre-test score and raise up 
to one. Similarly, loss goes down from one to zero as 
pre-test score increases. For a pre-test score value of 
50% both gains and losses have the same value around 
35%. Such a change in student answers at the same ques¬ 
tion has been observed between two successive passes of 
the FCI [T2|. Reported values of gains and losses were 
18% and 20% for a population mean score of 47%. Dis¬ 
crepancy between their experimental measures and IRT 
prediction could largely be attributed to a memory effect 
because students took the tests two times in the same 
week so they may have memorized some of there initial 
answers. At the contrary, our IRT model assumes the 
independence between the test-retest, i.e. that students 
have not memorized any of their previous answers. 


V. PROPORTION OF ANSWER’S CHANGE 

In order to interpret why gains and losses can have 
such high values even when no learning occurs, we focus 
directly on the global proportion of answer’s change. In 
a test-retest situation, we have : 


which comes from the definition of G and L and the fact 
that Spre — i • 

And finally Opre and Opost are estimated by reversing 
the relation giving S 8iS a function of ^ : S{0) = Pi{0). 
This is an approximation where the observed raw score 
is assumed to be equal to the true score. The sample of 
Lasry et al. [T3| contains 13 000 students divided into 9 
bins leading to an average of 1400 students for each raw 
score. In this case the hypothesis of equating the raw 
score to the true score seems reasonable. 


IGi = GIi = S{1- S) - 6P^ , ( 6 ) 

where S = Spre = Spost- The explicit dependence of S 
and SPi with Opre = Opost have been omitted for clarity. 
The first term of the right hand side of equation (§ is 
a parabolic function of S and does not depend on the 
considered test. Hence, for any conceptual test, this part 
is identical. The second term on the right hand side of 
equation © depends on the item characteristic curves 
and consequently on the test. Values of IGi have been 
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FIG. 3. Gain (blue lines) and loss (yellow lines) as a function 
of pre-test score at the FGL Gontinuous lines are IRT predic¬ 
tions when learning occurs, dashed lines are IRT predictions 
when no learning occurs (i.e. assuming 

^post — ^pre^’ 



FIG. 4. Proportion of answer’s changes from a right (resp. 
wrong) answer to a wrong (resp. right) answer (continuous 
line) for the FGL Dashed line is S {1 — S). 

plotted for the FCI as a function of the score in Fig. It 
is clear that in this case, the contribution of while 
not negligible, is rather small. Consequently, for a group 
of students with a true score of 50 %, nearly 18 % of 
answers change from correct (resp. incorrect) to incorrect 
(resp. correct) in a test-retest situation. This result has 
a consequence on the reliability of the test and on the 
interpretation of gains and losses. In order to interpret 
gains and losses in term of learning outcome, their values 
should be as small as possible in a test-retest situation. 
As a consequence, values of ICi should also be as small 
as possible. Because the first term of equation does 
not depend on the test, one can only influence the 5P^ 
term in order to make it as high as possible (so that ICi 
decreases). It immediately leads to the conclusion that 
one has to choose questions - therefore the P^’s functions 
- in order to maximize values of 6P^ for all 0. 

In order to understand how to choose those PFs func¬ 


tions, we consider the simple case of a test with only 
3 questions. Three different cases are considered, each 
one corresponding to a particular set of P^’s functions. 
The three cases are named test A, B and C and their 
item characteristic functions are plotted in Fig. (left 
column). For each 6 >, the proportion of answer’s change 
is given by Cli = Pi {1 — Pi), where ( . ) denotes the 
averaging over the 3 questions of the test. Hence, each 
individual question i has a contribution of Pi (1 — Pi). 
This contribution is null when P^ = 0 or 1 and has a 
maximal value of 0.25 when Pi = 0.5. 

Test A has three questions whose characteristic curves 
overlap for a wide range of As a consequence, for a 
wide range of 0 all individual questions will contribue to 
the proportion of answers that change. For instance, for a 
true score of 50% ( 6 > = 0), Pi{0) = 0.88, P2{0) = 0.5, and 
Psi^) = 0 - 12 , leading to Pi (1 — Pi) = P 3 (1 — P 3 ) = 0.1 
and P 2 (1 — P 2 ) = 0.25. Hence, for a score of 50%, the 
proportion of change, which is the average of these three 
values, is about 15%. The representative curve of ICi is 
very similar to the one obtained for the FCI, indicating 
that a lot of item characteristic curves of the FCI overlap, 
as already noted in previous studies analyzing the FCI 
using a unidimensional IRT [zHn]. 

At the opposite, test C has three questions whose char¬ 
acteristic curves do not overlap - i.e. the range of 0 where 
these functions go from a value close to 0 to a value close 
to 1 are well separated (see Fig. [^. As a consequence, 
each question will contribute separately to the proportion 
of answer’s change. For instance, for a true score of 50% 
{0 = 0), Pi{0) 1, P2{0) = 0.5, and PsiO) 0, leading 

to Pi (1 - Pi) = P 3 (1 - P 3 ) ^ 0 and P 2 (1 - P 2 ) = 0.25. 
Hence, for a score of 50%, the proportion of change - 
which is the average of the Pi values - is 0.25/3 ^ 0.08. 
This value is much smaller than for test A. In a test 
with N separated questions, the maximal value of ICi is 
0.25/A^ and is obtained for values of P = 0.5/A^, 1.5/A^, 
... , {N — Q.b)/N. In a test with = 30 separated- 
questions, maximal value for ICi is about 1%. Hence the 
change of answers occurs very rarely, and values of gains 
and losses remain very small. 

Finally test B shows the transition between test A and 
the extreme case of test C. 


VI. INTERPRETATION OF GAINS AND 
LOSSES WHEN LEARNING OCCURS 

According to the discussion of the previous section, the 
interpretation of gains and losses should be separated 
in two extreme cases : when a wide majority of item 
characteristic curves overlap - like in test A - and when 
none of the item characteristic curves overlaps - like in 
test C. _ 

In the first case, 5Pf is small and equations ([^ and ^ 
reduce to G = Spost and L = 1 — Spost- Hence the gain 
is more or less the post-test score and does not add any 
supplementary informations on student’s learning. One 
can still want to isolate the part of the gain due to in- 
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FIG. 5. Each row corresponds to given tests (A, B or C) comprising 3 questions. Left : item characteristic curves of the three 
questions (dashed lines) and true score (continuous lines) as functions of proficiency 0. Right : proportion of answer’s change 
ICi = Cli (continuous line) and aS'(1 — S) (dashed-line) as functions of the true score S. 


striiction by defining AG — Gleaming Gno learning* the 
case of type A test, AG = Spost — Spre = 9raw^ leading to 
the so-called raw gain (because G = Spre when no learn¬ 
ing occurs). The analysis of Lasry et al. m data shows 
that graw is a decreasing function of the pre-test score. 
Does it mean that students with lower initial knowledge 
gain more than students with higher initial knowledge 
? No because student’s post score is limited to 100% so 
the raw gain Qraw tends to zero when the pre-test score 
tends to 100%. Also the score is an ordinal scale and 
not an interval scale Hi]. As a consequence, the raw 
score can only lead to a sorting of students but an in¬ 
crease of 1 point for a student with a low initial score 
does not reflect the same learning than an increase of 1 
point for a student with a high initial score. A correct 


comparison of progress has to invoque an interval scale 
such as the student proficiency 0 introduced in the previ¬ 
ous sections HE]. Fig.j^plots the raw gain as a function 
of the pre-test score for given values of students learning 
increase AO. As seen on this figure, a given value of graw 
corresponds to various value of student’s progression AO, 
depending of the initial student’s score. 

In the second case (test of type C), where all ques¬ 
tions are well separated, the proportion of questions that 
changes when no learning occurs is nearly null - it is 
lower than 5% for A > 5. Assuming a student positive 
progression AO = Opost — ^pre greater than the error range 
of all questions (i.e. Vi, AO 1/ai with the discrimi¬ 
nation power), the number of answers that change from 
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FIG. 6. Evolution of the raw gain with initial pre-test score 
for three hxed values of student’s learning AO. The raw gain 
corresponds to AG for a type A test. 

incorrect to correct is Spost — Spre leading for the gain to 
^ ~ impost Spre) / (1 Spre) ~ ^Hake • 

Interestingly, one recovers in this limit the Hake’s 
gain [2], which can be interpreted as the proportion of 
questions changing from incorrect to correct in a test 
comprising seperated item response curves (like test C). 
The number of answers that change from correct to in¬ 
correct is null and L = 0. However, like the raw gain, the 
Hake gain is not an interval scale [6] and has to be taken 
with due care when comparing student’s progression, as 
already emphasized. To illustrate this, let’s consider an 
hypothetical test where the true score is a logistic func¬ 
tion of the proficiency : S = {1 ex.p{—0))~^. This 

model is characteristic of a test where question’s difficul¬ 
ties are distributed over the proficiency scale following 
a gaussian law : there are few easy questions, few hard 
questions and a wide majority of questions with an in¬ 
termediate level of difficulty. The Hake gain is plotted 
on Fig. as a function of the pre-test score for various 
fixed value of student’s learning AO that are typical of 
student’s learning (see for instance Fig. for typical val¬ 
ues of AO in a mechanic course). As can be seen, the 
gain is an increasing function of the pre-test score for a 
fixed value of student’s learning. Hence, the fact that 
the gain is larger for initial high level students than for 
initial low level students does not necessarily reveal that 
the initial high level students have learned more. More¬ 
over, a given value of G corresponds to various value of 
student’s progression AO, depending of the initial stu¬ 
dent’s score. As shown in Fig. a fixed value of the 
gain - for instance 0.34 - correspond to a strong learn¬ 
ing for low pre-test score {AO = 2 for S=8%), a medium 
learning for medium pre-test score {AO = 1 for S=30%) 
and a low learning for high pre-test score {AO = 0.5 for 
S = 80%). This clearly shows that the Hake gain should 
not be used to compare student’s progression when they 
have different pre-test score, even in test of type C. 

Table [T] summarizes values of G and L for the two limit 
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FIG. 7. Evolution of the Hake gain with initial pre-test 
score for three hxed values of student’s learning AO. Green 
dashed line is G = 0.34 and correspond to AO = 2 for S=8%, 
AO = 1 for S=30% and AO = 0.5 for S = 80%. The Hake 
gain corresponds to AG for a type G test. 


Type of test 

G 

L 

AG 

A 

Spost 

1 

Spost 

draw 

G 

Gnake 


0 

Gnake 


TABLE I. Summary of gains and losses for the different types 
of test. ^^Gr — G'i 0 a,i-ning G'no learning thc difference m 
gain between a situation when learning occurs and a situa¬ 
tion when no learning occurs, that is to say the part of the 
gain which is due to learning. 


cases. As can be seen, AG reduces to the raw gain for 
type A tests and to the Hake gain for type C tests. 

We conclude this section by discussing the efficiency 
of instruction with respect to the initial level of the stu¬ 
dents. As already emphasized, the prohciency 0 has good 
properties mi and hence could be used to determine 
the learning AO of a student, AO = Opost — ^pre- This 
increase of prohciency is plotted in Fig. |^as a function 
of the pre-test score for the data of Lasry et al. [13]. We 
have evaluated 0 using the scores by inverting the rela¬ 
tion S{0). According to Lasry et al. [13], uncertainties 
on pre-test scores, gains and losses are about 2%, leading 
to uncertainties on the post-test score of the same order 
of magnitude. These uncertainties lead to uncertainties 
on the prohciencies, particularly for low or high scores 
due to the ’S’ shape of the curve, and are represented in 
Fig-i If 0 is assumed to be the good scale for measuring 
the learning. Fig. [^clearly shows that learning decreases 
as the pre-test score increases. This is an opposite con¬ 
clusion with the hrst interpretation of the evolution of 
gains and losses with pre-test score, but in accordance 
with the evolution of ^raw with pre-test score. It seems 
to state that our teaching methods are more efficient on 
students with low prior knowledge. We recall that this 
result is based on data from more than 13,000 students 
who had taken the FCI at the beginning and at the end 
of an introductory physics course in a large variety of 
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FIG. 8. Evolution of student’s learning {AO = Opost — Opre) 
with the pre-test score evaluated from data of Lasry et al.flS]. 

institutions: US high schools (10,007) , three Canadian 
two-year colleges (971), a US public university (1560) 
and three top-tier private universities (884) [13]. Due to 
possible correlations between students’ prior knowledge 
and student’s institution, this could reflect a difference 
between institutions. But this also could mean that it is 
more difficult in an introductory physics course to give 
the same increase of learning to students with high prior 
level knowledge than to students with low prior level 
knowledge. This discussion is out of the scope of this 
article but in order to answer this question one would 
have to evaluate AO for each student in a group follow¬ 


ing the same course with the same teacher, plotting the 
same curve as in Fig. and finally perform a comparison 
across institutions. 


VII. CONCLUSION 

We have shown that IRT is able to fairly well predict 
experimental measurements of gains and losses with the 
FCI when learning occurs. In addition, IRT shows that 
values of gains and losses for the FCI are rather high 
even when no learning occurs. The reason being that 
item characteristics curves overlap. All errors associated 
to individual questions contribute together to the prob¬ 
ability of answer’s change, leading to a difficult interpre¬ 
tation of gains and losses. In such a case the gain is more 
or less the post-test score and does not reveal that ini¬ 
tial high level students have learned more that initial low 
level students. 

In the case where item characteristic curves do not 
overlap, answer’s changes are very low, the gain reduces 
to the Hake gain while the losses drop to zero. 

We have shown that the effect of instruction can be 
assessed by looking to the proficiency increase instead of 
looking to the gain increase. The proficiency increases 
more for low-level student (i.e. low pre-test score). 
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